MIXING OF THE UPPER TRIANGULAR MATRIX WALK 
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Abstract. We study a natural random walk over the upper triangular 
matrices, with entries in the field Z2, generated by steps which add 
row i + 1 to row i. We show that the mixing time of the lazy random 
walk is O(n^) which is optimal up to constants. Our proof makes key 
use of the linear structure of the group and extends to walks on the 
upper triangular matrices over the fields Z, for q prime. 



1. Introduction 

For n > 2 and q a prime let Gniq) be the group of n x n-upper triangular 
matrices under multiplication with entries in the field Zg and 1 along the 
diagonal. The upper triangular walk on Gn(2) is the Markov chain where 
each step is given by choosing a random 1 < i < n — 1 and adding row i + 1 
to row i. The walk is stationary with respect to the uniform distribution on 
G„(2) and our main result establishes the order of the mixing time answering 
a question of Stong [18j and of Arias-Castro, Diaconis and Stanley ^ and 
Problem 14 of [Ml. 



Theorem 1. There exists a constant C > such that the mixing time of 
the lazy upper triangular walk satisfies 

This bound is tight up to constants as it is known that the mixing time 
is at least order (see e.g. [E]). Indeed the projection onto the rightmost 
column of the matrix is itself a Markov chain given by an East model (see 
Section [2.ip which has mixing time of order n^. 



1.1. Background. The random walks on G„(g) have received significant 
attention. There are two natural extensions to fields Zg for q prime. We will 
focus on the walk considered by Coppersmith and Pak [5l[T6]. This walk is 
given by the set of generators {/ + aSj^j+i : 1 < i < n—l,a G TLq\ where E'jj 
denotes the nx n matrix with 1 at position and everywhere else. We 
denote this walk as Wi. An equivalent description is that each step entails 
choosing a uniformly random 1 < i < n — 1 and a £ Zq and adding row i + 1 
multiplied by a to row i. This is an ergodic Markov chain reversible with 
respect to the uniform distribution on Gn{q)- When q = 2 this corresponds 
to the lazy version of our upper triangular walk of Theorem [TJ 
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The other extension, which we will denote W2, is the walk on Gn{q) given 
by generators {/ it -Ej^j+i : 1 < i < n — 1} which has been studied by 
Diaconis, Saloff-Coste, Stong and others (HEKlH]. In the case g = 2 it is 
exactly the upper triangular walk of Theorem [TJ Random walks of the form 
W2 were first studied by Zack [19j in the case n = 3 as a random walk on 
the Heisenberg Group. In the setting of q growing and n fixed, Diaconis and 
Saloff-Coste [7-9J introduced a number of sophisticated techniques giving 
sharp rates of convergence. 

In the case of n growing the first results follow from work of Ellenberg [11] 
which gives an upper bound of order . This was subsequently improved 
by Stong who gave sharp bounds on the spectral gap of 0(n~^g~^) which 
translates into an upper bound on the mixing time of n^q^ log q. 

The use of character theory has proved to be a key tool in analyzing 
random walks on groups (see e.g. [6l|10]). One reason random walks on 
Gn{q) have resisted analysis is the lack of understanding of the character 
theory of Gn{q) which is "considered unknowable" [3]. Arias-Castro, Diaco- 
nis and Stanley [3] approached the problem using the super-character theory 
of Andre, Carter and Yan and gave a sharp analysis of a related chain whose 
generators are the conjugacy classes of /it £"4^4+1. Using comparison, this ap- 
proach yielded an upper bound on the mixing time of 0(n^g^ \og{n) log(g)) 
for the chain W2. 

Using a stopping time argument, Pak [16] showed that when q > 2n^, the 
mixing time of the walk Wi is 0(n^/^). Under the same assumption on the 
growth of q, this was subsequently improved by Coppersmith and Pak [5] 
to a mixing time of 0{v?) which is optimal. By taking q so large, they 
can effectively assume that once an entry is updated to a non-zero entry it 
remains non-zero. 

Our work resolves the case of fixed q and also gives new bounds when q 
is growing with n. 

Theorem 2. There exists an absolute constant C > such that the mixing 
time of the lazy upper triangular walk Wi on Gn{q) satisfies 

4k < Cn^logq. 

This yields new bounds when q < 2n^ and so is outside of the regime 
studied by Coppersmith and Pak. As in the case q = 2 there is a natural 
lower bound of order n^. We conjecture that the mixing time is order 
for all q which interpolates between our fixed q result and Coppersmith and 
Pak's large q result. 

Our proof crucially makes use of the linear structure of Gn{q)- When 
multiplying by I -|- aEn-i,n we observe that the choice of a only affects 
the final column. Indeed we can write the off-diagonal portion of the final 
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column at time i as 60 + Sfc=i^fc^fc G Zp~^ where the S Zp are 

uniformly chosen independent of the and the remainder of the matrix. It 
follows that the final column will be uniform and independent of the rest of 
the matrix if the vectors {61, . . . , 6j} span Z^^-*^. 

We prove that for large enough C and t = Cn? log q that the hk do span 
with high probability by expressing the vectors as projections at random 
times of another matrix walk. This walk is a reflection of the original walk 
involving column rather than row operations. By the linear structure of the 
walk, the inner product of the a fixed vector and the projection of the Markov 
chain evolves as a site in the East model. Then using known spectral gap 
estimates for the East model and large deviation results for Markov chains 
we obtain the necessary bounds on when the span Z^^^. Iterating this 
argument over all the columns completes the proof. 

1.2. The East model. A key role is played by the one dimensional East 
model which is an extensively studied example of a kinetically constrained 
spin model (see [HIl] and references therein). Formally the East model with 
parameter < p < 1 is a Markov chain whose state space is 

-H„ = {/i = (/ii, . . . , /il) G {0, ir : /ii = 1}. 

The dynamics are given as follows. In each step choose a uniformly random 
1 < z < n — 1 and 

• If /ij = 1 set /ij+i to 1 with probability p and with probability 
1 — p. 

• If /ij = do nothing. 

The dynamics are reversible with respect to the i.i.d. Bernoulli measure 
where the hi are 1 with probability p for 2 < i < n. When q = 2 the East 
Model with p = 1/2 describes the dynamics of a single column of the upper 
triangular matrix walk. 

2. Preliminaries 

It will be convenient for our proof to consider the continuous time version 
of Wi whose generator is given by 

n— 1 ^ 

(^/)(^) = E - E /((^ + - fix) (2.1) 

This version admits the following description: for each l<i<n — Iwe 
have a Poisson clock and when clock i rings we choose a uniform a G and 
add row i + 1 multiplied by a to row i. Note that this speeds the chain up 
by a factor of n — 1 of the original. Let Xt denote this Markov chain. The 
continuous time representation has the useful property that for 1 < n' < n 
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the process Xt projected onto the n' x n'-submatrix of its first n' rows and 
columns is itself a Markov chain whose law is given by the walk on Gn'{q). 

In this paper we consider mixing measured in the total variation distance. 
For two probability measures z^i, 1^2 on a countable space the total vari- 
ation distance is defined as 



For an ergodic Markov chain Yt with stationary distribution u we define 



and the (worst case) mixing time is defined as t^ix = iMix(l/2e). We 
denote the walk Wi on Gn{q) with generator given in (|2.1|) as Xt = X", its 
mixing time as t^ix and we let dn{t) denote ||P'(X" £ ") ~ ^"IItv where vr" is 
the uniform distribution on Gn{q) and the stationary distribution of X^. 

For an ergodic reversible Markov chain we denote the eigenvalues of the 
generator as = Ai < A2 < . . . and the spectral gap, the second eigenvalue, 
as A = A2. For any ergodic reversible finite Markov chain A^^ < tmix (c-f-, 
e.g. [2lll4j). The mixing time of the lazy random walks and continuous time 
random walks are closely related. Theorem 20.3 of [13] implies that if the 
mixing time of the continuous time random walk Wi is 0(n log g) then the 
mixing time of the discrete time lazy version of the walk is 0{n? logg). As 
such for the rest of the paper we consider the continuous time version of the 
walk. 

We will make use of Hoeffding type bounds for Markov chains. There are 
various versions including results of Gillman and sharper exponential 
rates given by Leon and Perron [13j. We will make use of Theorem 3.4 
of [15] by Lezaud which gives a continuous time Markov chain analogue. In 
particular it shows that for a reversible Markov chain Yt with stationary 
distribution v, and for any subset A G i^, 



where t'mm = min{z^(x) : x £ i'{x) > 0}. 

2.1. Continuous time East models. We will also use the continuous time 
version of the East Model. Its dynamics on Tin are given as follows. For 
each l<i<n — Iwe have a rate 1 Poisson clock. When clock i rings: 

• If /ij = 1 set /ij+i to 1 with probability p and with probability 
1 — p. 

• If /ij = do nothing. 




inf{t : max\\Fy{Yt G •) 



z^IItv < s} 




(2.2) 
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Again the dynamics is reversible with respect to the i.i.d. Bernoulli measure 
where the hi are 1 with probability p for 2 < i < n. Let 9p{n) denote the 
spectral gap of the East model of length n with parameter p. Aldous and Di- 
aconis [Ij showed that 6p := infj^ 6p{n) > for all p, establishing a bounded 
spectral gap for the infinite dynamics, and also giving the asymptotics as p 
tends to 0. 

We now consider a natural g-state extension of the East model. Define 
the state-space as 

ni = {h = {hi,...,hn)ez^g:hi = i} 

and again for each l<z<n — Iwe have rate 1 Poisson clocks so that when 
clock i rings: 

• If /ij 7^ set hi^i to a uniform choice in Zg. 

• If /ij = do nothing. 

Let A'^(n) denote the spectral gap of this process. 

Proposition 2.1. The spectral gaps satisfy X* := inf^^^ A'^(n) > 0. 

Proof. Let Zt denote the (?-state East model and let ip : Tin — Tin given by 



if /li = 0, 

1 otherwise. 



The process = ip{Zt) has the law of the standard East model with pa- 
rameter p = 

We now consider an update of Zt where clock i rings and Zt{i) ^ 0. The 
site i + 1 is updated to with probability | and from a uniform choice 
on {1, . . . ,q — 1} with probability In particular it is uniformly dis- 

tributed, conditional on the value + 1). It is easily verified that this 
conditional independence property holds at all times after such an update 
and is independent of the process at other sites. 

So let B{t) be the event that for each 1 < i < n — 1 there has been an 
update from clock i at a time s when Zs{i) ^ before time t. It follows 
that on the event B{t) the variable Zt is uniformly distributed conditional 
on the value of Z't. Hence we have that 

mzt G •) - nz G oiItv < mz't e •) - m^iz) e ■)hv+nB'it)) , (2.3) 

where Z is distributed according to the stationary distribution of Zt- Since 

the spectral gap of Zj. is 9q-i (n) we have that 

g 

||P(Z; e •) - ni'iZ) e •) IItv < C(n, q) eM-tOg-i (n)) , (2.4) 

<i 
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for some constant C{n,q) as the spectral gap gives the asymptotic conver- 
gence rate to stationarity. Fix some 1 < i < n — 1. Then equation (|2.2p 
impUes that for some c > and constant C'{n,q) 

I I{Zs{i) / 0)ds < ^t) < C'{n,q)exp{-cdg-i{n)t). (2.5) 

Jo / q 

The times at which clock i rings form a rate 1 Poisson process on [0, t\ which 
is independent of the process Zt{i). Hence the chance that B fails is given 
by 

P < P ^31 < i < n - 1, ^ I{Zt{i) ^ 0)dt < l?j (2.6) 

+ (n - 1)P (Poisson(it) = O) 
< nC'{n,q)exp{-ceg-i{n)t) +nexp(-it). (2.7) 

Combining equations (j2.3p . ()2.4p and (j2.6p we have 

\\F{Zt G .) -P(ZG OIItv < Cin,q)exp{-ceg-iin)t) 

q 

+ nC'{n, q) exp{—cd q-i {n)t) + nexp( — ^t) 
<? 

and so the exponential rate of decay is at least cdg-i A ^. Hence we have 
that 

A''(n) >c02_i(n) A^. 

q -3 

Theorem 4.2 of |lj establishes that for some < po < 1, whenever pQ < p < 1 
we have that 9p{n) > ^. Combining this with the fact that inf„0p(n) > 
for all p we have that 

1 

q,n ^ ' ~ 3 9," 



A* = inf A«(n) > - A inf c0g_i (n) > 



which completes the proof. ■ 

It was observed in |3j that the projection onto column j of the upper 
triangular walk on Gn{q) is given by an East model, specifically the j length 
g-state East model. Our proof makes use of this fact and additionally that 
the observation also holds for linear combinations of columns. 

3. Proof of Theorem [T] 

In this section we work exclusively with the continuous time version of 
the Markov chain. We begin with some notation. By symmetry we will 
assume that the Markov chain begins at Xq = I. Fix some terminal time 
T. First we split the clocks and associated row operations into those with 
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1 < i < n — 2 and those with i = n — 1. Let < ti < ^2 < • • • denote the 
times at which the Poisson clocks ring for 1 < i < n — 2 and let Wj denote 
the associated row operation matrices. We denote the cr-algebra generated 
by these updates as T. Let N(t) = max{j > : tj < t} and define the 
backwards process on the interval [0, T] by Yq = / and 

N{T)-N{T-^t)-l 

Yt= WN(T)-j = WN(T)W]y(^T)~l ■ ■ ■ WN(T-t)+l 

j=0 

and for < t < t' < T we let 

Yt,t' = Yt ^Yf = H^Ar(T-t)^Ar(T-t)-l • • • ^Af(T-i')+l- 

The process Yj is a Markov process given by the following column dynamics 
description. For each l<i<n — 2we have a Poisson clock and when clock 
i rings we choose a uniform a ^ J^q and add column i multiplied by a to 
column i + With this description column n is fixed while the walk on the 
top left (n — 1) X (n — 1) submatrix is equivalent to the walk Wi on 
up to a refiection of the matrix. 

Next consider the second type of row operations which have i = n — 1. 
Let si < ,52 < • • • denote the times at which row n is added to row n — 1 and 
let ai, a2, . . . denote the associated scalers. Note that these are independent 
of T. Denoting 

J{t) = max{j > : Sj < t}, 

as the number of row operations with i = n — 1 up to time t. The following 
expansions of Xt allow us to track the contribution of operations of the sec- 
ond type, taking advantage of the linear nature of Gn{q)- By construction, 
with the definitions above, we have that 

^se = (I + aiEn-l,n)YT-si,,T-se_iXsf,_-^ (3.1) 

for 1 < i < J(T) where sq = 0. We now show, by induction, that 

i 

= Yt-si,T + ^ akYT-se,T-SkEn-l,n- (3.2) 
k=l 

For £ = equation (j3.2p is immediate and for any upper triangular matrix 
Y e Gn{q) we have that 

En-l,nY = En-\,n and En-l,nEn-l,n = 0. (3.3) 
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Then by equation ()3.ip 

^se+i = {I + CLe^iEn-i^n)YT-si+i,T-seXs 



i+1 

= Yt-si,+i,T + X akYT-st+:,,T-SkEn-l,n 
k=l 

where the final equahty follows by expanding the sum, applying equation (13. 3p 
and using the fact that 1t-s^+i,t-S(;+i = I completing the inductive step. 
Finally, since = i^o,J{T)^j(T); it follows from equation ()3.2p that 

J(T) 

Xt = Yt+Y1 akYT-s,En-l,n. (3.4) 
k=l 

This expression allows us to separate the mixing in the first n — 1 columns 
from that of the final column. Let Ai denote set of n x n-matrices with 
entries in Zp and let A4* denote the sub -space of matrices spanned by Ei^n 
for 1 < z < n — 1. Each matrix YT-s^,En-i,n is in M*. Let A = A{T,n) 
denote the event that the collection {YT-s,^En-i^n : 1 < ^ < JCE)} spans 
M*. Since the are independent of the Yt-s^ it follows that on A the 
sum X]fc=i o,kYT-SkEn-i,n IS Uniformly distributed on A4* and so we have 
the following proposition. 

Proposition 3.1. The distances to stationarity satisfy 
dn{T) < dn^i{T) +nMT.nf). 

Proof. Let X be uniform on the stationary distribution of Xt. As 

observed above, the Markov chain restricted to the first n — 1 columns of 
Xt gives the walk Wi on Hence we may couple the first n — 1 

columns of Xt and X except with probability dn-i{T). Moreover the walk 
on the first n — 1 columns is T measurable and independent of the scalars 
{«!) • • • ) aj{T)}- 

On the event A{T,n) we have that Ylk=i "'k^T-s^:En-l,n is uniformly 
distributed on M* and independent of J-. Hence we may couple the final 
column of Xj- and X independent of J- when A{T, n) holds. Hence we 
may couple Xt and X except with probability dn-i{T) +P(^(T, nY) which 
completes the proposition. ■ 

It remains to bound the probability of A. For this we use the fact that 
Yf is a Markov chain and its first n — 1 columns are given by the column 
version of the walk Wi. 
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Proposition 3.2. There exist absolute constants < ci,C2 < oo such that 
F{A{T,n)) > 1 - e-ci(^-c2'^i°g'?). 

Proof. The vectors {YT_sk^n-i,n : 1 < ^ < span Ai* if and only if 

for each non-zero vector b = (bi, . . . , bn-i, 0) G \ {(0, . . . , 0)} we have 
that for some 1 < A; < J{T) that 

b ■ YT-s,en-i ^ 0. (3.5) 

Let I = miuj bj ^ 0. We may assume without loss of generality that bi = 1 
since multiplying 6 by a non-zero scaler does not affect p.Sp . 

Now recall our observation that Yt is the dynamics on upper triangular 
matrices where for each 1 < i < n — 2 we add column i times a random 
scaler a to column i + 1 according to the times of rate 1 Poisson clocks. The 
process Zt = b ■ Yt is also a Markov chain on vectors of length n with the 
following description. For each 1 < i < n — 2 at rate 1 we add position i 
times a random scaler a to position i -|- 1. Its initial condition is Zq = b and 
so the value of its first / — 1 entries and the final entry are fixed as 0. The 
value in entry / is fixed at 1 and entries / to n — 1 perform the g-state East 
model of length n — I. By Proposition 12.11 this chain has spectral gap at 
least A* > 0. Its state space is the uniform distribution over possible 
vectors and so applying equation (12. ip we have that 

P (^j^ I{Zten-i + 0)dt < iT^ < exp(-cA*r) (3.6) 

where /(•) denotes the indicator and c is an absolute constant. The times 
T — Sk form a rate 1 Poisson process on [0, T] which is independent of the 
process Yt. Hence 

P(V1< A:< J(r),ZT_,^e„_i = 0) 

< P I{Zten-i / 0)dt < iT^ + P (Poisson(iT) = O) 

< g"/2exp(-cA*r) + exp(-iT). (3.7) 
Taking a union bound over b completes the proof. ■ 

We are now ready to prove Theorem [2] which in turn proves Theorem [T] 

Proof of TheoremlM As noted above it is sufficient by Theorem 20.3 of jl4j 
to prove an upper bound on the mixing time of the continuous time chain Wi 
is 0(n log g). Combining Propositions 13. l l and 13.2] we have that 

dniT) < dn-i{T) + e-^i(^-'=2-i°S9) 
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and hence by induction 

n 

i=2 

For c > C2 we have that (i„(cn log g) = o(l) which completes the proof. ■ 
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